28 research outputs found

    Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vector for Visual Action Recognition

    Get PDF
    Feature encoding has been extensively studied for the task of visual action recognition (VAR). The recently proposed super vector-based encoding methods, such as the Vector of Locally Aggregated Descriptors (VLAD) and the Fisher Vector (FV), have significantly improved the recognition performance. Despite of the success, they still struggle with the superfluous information that presents during the training stage, which makes the methods computationally expensive when applied to a large number of extracted features. In order to address such challenge, this paper proposes a Saliency-Informed Spatio-Temporal VLAD (SST-VLAD) approach which selects the extracted features corresponding to small amount of videos in the data set by considering both the spatial and temporal video-wise saliency scores; and the same extension principle has also been applied to the FV approach. The experimental results indicate that the proposed feature encoding scheme consistently outperforms the existing ones with significantly lower computational cost

    Unifying Person and Vehicle Re-Identification

    Get PDF
    Person and vehicle re-identification (re-ID) are important challenges for the analysis of the burgeoning collection of urban surveillance videos. To efficiently evaluate such videos, which are populated with both vehicles and pedestrians, it would be preferable to have one unified framework with effective performance across both domains. Unfortunately, due to the contrasting composition of humans and vehicles, no architecture has yet been established that can adequately perform both tasks. We release a Person and Vehicle Unified Data Set (PVUD) comprising of both pedestrians and vehicles from popular existing re-ID data sets, in order to better model the data that we would expect to find in the real world. We exploit the generalisation ability of metric learning to propose a re-ID framework that can learn to re-identify humans and vehicles simultaneously. We design our network, MidTriNet, to harness the power of mid-level features to develop better representations for the re-ID tasks. We help the system to handle mixed data by appending unification terms with additional hard negative and hard positive mining to MidTriNet. We attain comparable accuracy training on PVUD to training on the comprising data sets separately, supporting the system's generalisation power. To further demonstrate the effectiveness of our framework, we also obtain results better than, or competitive with, the state-of-the-art on each of the Market-1501, CUHK03, VehicleID and VeRi data sets

    Makeup Style Transfer on Low-quality Images with Weighted Multi-scale Attention

    Get PDF
    Facial makeup style transfer is an extremely challenging sub-field of image-to-image-translation. Due to this difficulty, state-of-the-art results are mostly reliant on the Face Parsing Algorithm, which segments a face into parts in order to easily extract makeup features. However, this algorithm can only work well on high-definition images where facial features can be accurately extracted. Faces in many real-world photos, such as those including a large background or multiple people, are typically of low-resolution, which considerably hinders state-of-the-art algorithms. In this paper, we propose an end-to-end holistic approach to effectively transfer makeup styles between two low-resolution images. The idea is built upon a novel weighted multi-scale spatial attention module, which identifies salient pixel regions on low-resolution images in multiple scales, and uses channel attention to determine the most effective attention map. This design provides two benefits: low-resolution images are usually blurry to different extents, so a multi-scale architecture can select the most effective convolution kernel size to implement spatial attention; makeup is applied on both a macro-level (foundation, fake tan) and a micro-level (eyeliner, lipstick) so different scales can excel in extracting different makeup features. We develop an Augmented CycleGAN network that embeds our attention modules at selected layers to most effectively transfer makeup. Our system is tested with the FBD data set, which consists of many low-resolution facial images, and demonstrate that it outperforms state-of-the-art methods, particularly in transferring makeup for blurry images and partially occluded images

    Expression of neurturin, glial cell line-derived neurotrophic factor, and their receptor components

    Get PDF
    PURPOSE. Dysregulation of neurturin (NTN) expression has been linked to photoreceptor apoptosis in a mouse model of inherited retinal degeneration. To investigate the extent to which any such dysregulation depends on the nature of the apoptotic trigger, the expression of NTN, glial cell line-derived neurotrophic factor (GDNF), and their corresponding receptor components were compared in a rat model of light-induced retinal degeneration. METHODS. Retinal expression of NTN, GDNF, their corresponding receptors GFR␣-2 and -1, the transmembrane receptor tyrosine kinase (Ret), and cSrc-p60, a member of the cytoplasmic protein-tyrosine kinases family, were analyzed by Western blot analysis and immunocytochemistry in cyclic light-and dark-reared rats in the presence and absence of intense light exposure. RESULTS. All components for NTN-mediated signaling activation are present in rat photoreceptors and retinal pigment epithelium, the cells primarily affected by light-induced damage. The expression levels of GDNF, its receptor components, and NTN, were not affected by light-induced stress. However, GFR␣-2 expression strikingly increased with the extent of retinal damage, especially at the photoreceptors, in contrast to decreased levels that were observed previously in an inherited degeneration model. CONCLUSIONS. The present study indicates that the expression of receptors of the GDNF family is independently regulated in normal and light-damaged rat retina, and in conjunction with previous work, suggests that the pattern of modulation of these genes during photoreceptor degeneration is determined by the nature of the apoptotic trigger. Such differential responses to different modes of retinal degeneration may reflect influences of the neurotrophic system on photoreceptor survival or in the regulation of neuronal plasticity. (Invest Ophthalmol Vis Sci. 2004;45:1240 -1246) DOI:10.1167/iovs.03-1122 G DNF and neurturin (NTN) are members of the glial cell line-derived neurotrophic factor (GDNF) family ligands (GFL) of neurotrophic factors. GFLs have been shown to influence the development of enteric, sympathetic, parasympathetic, and sensory neurons (for review see Ref. 1). They generally signal through a multicomponent receptor system consisting of the receptor tyrosine kinase Ret and a highaffinity ligand binding glycosyl-phosphatidylinositol (GPI)-linked coreceptor (GFR␣). GDNF-mediated bioactivity involves signaling molecules of the src-family of protein-tyrosine kinases; and, in particular, p60 Src has been shown to interact with activated Ret. 2 GDNF and NTN are expressed in a wide variety of tissues including the retina, suggesting an implication in diverse biological processes. 6 Upregulation of NTN mRNA expression was associated with progressive retinal neurodegeneration, but GFR␣-2 mRNA levels remained lower than in age-matched nondegenerative control retinas. On the assumption that increased NTN expression is a survivalpromoting response of the retina to the onset of degeneration, its potential neurotrophic effect on photoreceptors might be constrained by the persistently low GFR␣-2 levels in rd retinas. Alternatively, because NTN also signals through the GDNF receptor (GFR␣-1) but through a low-affinity interaction, 1 it is possible that increased NTN is limited in its efficacy by failure to activate sufficient survival-promoting pathways through the GFR␣-1 receptors. To assess the extent to which such modulations of expression of GFL members and their receptors are dependent on the nature of the apoptotic trigger, we have compared expression patterns of NTN, GDNF, and their receptor components in a model of photoreceptor cell death induced by exposure to intense light. In rats, light-induced retinal damage is rhodopsinmediated and dependent on light intensity, wave length and duration of the exposure, period of dark adaptation before exposure, and the exposure schedule. 8 -12 The effects were studied of both the type I (damaging both the photoreceptors and the retinal pigment epithelium) and type II (characterized by the loss of visual cells only) light-induced damage regimens on the expression of two members of the GDNF family. The retinal distributions of NTN, GDNF, and their receptor components were assessed by immunoblot and immunocytochemistry in control and light-stressed rat retinas

    The Usher 1B protein, MYO7A, is required for normal localization and function of the visual retinoid cycle enzyme, RPE65

    Get PDF
    Mutations in the MYO7A gene cause a deaf-blindness disorder, known as Usher syndrome 1B.  In the retina, the majority of MYO7A is in the retinal pigmented epithelium (RPE), where many of the reactions of the visual retinoid cycle take place.  We have observed that the retinas of Myo7a-mutant mice are resistant to acute light damage. In exploring the basis of this resistance, we found that Myo7a-mutant mice have lower levels of RPE65, the RPE isomerase that has a key role in the retinoid cycle.  We show for the first time that RPE65 normally undergoes a light-dependent translocation to become more concentrated in the central region of the RPE cells.  This translocation requires MYO7A, so that, in Myo7a-mutant mice, RPE65 is partly mislocalized in the light.  RPE65 is degraded more quickly in Myo7a-mutant mice, perhaps due to its mislocalization, providing a plausible explanation for its lower levels.  Following a 50–60% photobleach, Myo7a-mutant retinas exhibited increased all-trans-retinyl ester levels during the initial stages of dark recovery, consistent with a deficiency in RPE65 activity.  Lastly, MYO7A and RPE65 were co-immunoprecipitated from RPE cell lysate by antibodies against either of the proteins, and the two proteins were partly colocalized, suggesting a direct or indirect interaction.  Together, the results support a role for MYO7A in the translocation of RPE65, illustrating the involvement of a molecular motor in the spatiotemporal organization of the retinoid cycle in vision

    Makeup Style Transfer on Low-quality Images with Weighted Multi-scale Attention

    Get PDF
    Facial makeup style transfer is an extremely challenging sub-field of image-to-image-translation. Due to this difficulty, state-of-the-art results are mostly reliant on the Face Parsing Algorithm, which segments a face into parts in order to easily extract makeup features. However, this algorithm can only work well on high-definition images where facial features can be accurately extracted. Faces in many real-world photos, such as those including a large background or multiple people, are typically of low-resolution, which considerably hinders state-of-the-art algorithms. In this paper, we propose an end-to-end holistic approach to effectively transfer makeup styles between two low-resolution images. The idea is built upon a novel weighted multi-scale spatial attention module, which identifies salient pixel regions on low-resolution images in multiple scales, and uses channel attention to determine the most effective attention map. This design provides two benefits: low-resolution images are usually blurry to different extents, so a multi-scale architecture can select the most effective convolution kernel size to implement spatial attention; makeup is applied on both a macro-level (foundation, fake tan) and a micro-level (eyeliner, lipstick) so different scales can excel in extracting different makeup features. We develop an Augmented CycleGAN network that embeds our attention modules at selected layers to most effectively transfer makeup. Our system is tested with the FBD data set, which consists of many low-resolution facial images, and demonstrate that it outperforms state-of-the-art methods, particularly in transferring makeup for blurry images and partially occluded images

    3D Gaussian descriptor for video-based person re-identification

    Get PDF
    Despite being often considered less challenging than image-based person re-identification (re-id), video-based person re-id is still appealing as it mimics a more realistic scenario owing to the availability of pedestrian sequencesfrom surveillance cameras. In order to exploit the temporal information provided, a number of feature extraction methods have been proposed. Although the features could be equally learned at a significantly higher computational cost, the scarce nature of labelled re-id datasets encourages the development of robust hand-crafted feature representations as an efficient alternative, especially when novel distance metrics or multi-shot ranking algorithms are to be validated. This paper presents a novel hand-crafted feature representation for video-based person re-id based on a 3-dimensional hierarchical Gaussian descriptor. Compared to similar approaches, the proposed descriptor (i) does not require any walking cycle extraction, hence avoiding the complexity of this task, (ii) can be easily fed into off-shelf learned distance metrics, (iii) and consistently achieves superior performance regardless of thematching method adopted. The performance of the proposed method was validated on PRID2011 and iLIDS-VID datasets outperforming similar methods on both benchmarks

    Triplet Loss with Channel Attention for Person Re-identification

    Get PDF
    The triplet loss function has seen extensive use within person re-identification. Most works focus on either improving the mining algorithm or adding new terms to the loss function itself. Our work instead concentrates on two other core components of the triplet loss that have been under-researched. First, we improve the standard Euclidean distance with dynamic weights, which are selected based on the standard deviation of features across the batch. Second, we exploit channel attention via a squeeze and excitation unit in the backbone model to emphasise important features throughout all layers of the model. This ensures that the output feature vector is a better representation of the image, and is also more suitable to use within our dynamically weighted Euclidean distance function. We demonstrate that our alterations provide significant performance improvement across popular reidentification data sets, including almost 10% mAP improvement on the CUHK03 data set. The proposed model attains results competitive with many state-of-the-art person re-identification models
    corecore